why does b'(and sometimes b' ') show up when I split some HTML source[Python]

Posted by Oliver on Stack Overflow See other posts from Stack Overflow or by Oliver
Published on 2011-11-12T04:05:35Z Indexed on 2011/11/12 9:51 UTC
Read the original article Hit count: 358

Filed under:
|
|

I'm fairly new to Python and programming in general. I have done a few tutorials and am about 2/3 through a pretty good book. That being said I've been trying to get more comfortable with Python and proggramming by just trying things in the std lib out.

that being said I have recently run into a wierd quirk that I'm sure is the result of my own incorrect or un-"pythonic" use of the urllib module(with Python 3.2.2)

import urllib.request

HTML_source = urllib.request.urlopen(www.somelink.com).read()

print(HTML_source)

when this bit is run through the active interpreter it returns the HTML source of somelink, however it prefixes it with b' for example

b'<HTML>\r\n<HEAD> (etc). . . .

if I split the string into a list by whitespace it prefixes every item with the b'

I'm not really trying to accomplish something specific just trying to familiarize myself with the std lib. I would like to know why this b' is getting prefixed

also bonus -- Is there a better way to get HTML source WITHOUT using a third party module. I know all that jazz about not reinventing the wheel and what not but I'm trying to learn by "building my own tools"

Thanks in Advance!

© Stack Overflow or respective owner

Related posts about python

Related posts about parsing